Introducing a new approach for modeling a given time series based on attributing any random variation to a jump event: jump-jump modeling

When analyzing the data sampled at discrete times, one encounters successive discontinuities in the trajectory of the sampled time series, even if the underlying path is continuous. On the other hand, the distinction between discontinuities caused by finite sampling of continuous stochastic process and real discontinuities in the sample path is one of the main problems. Clues like these led us to the question: Is it possible to provide a model that treats any random variation in the data set as a jump event, regardless of whether the given time series is classified as diffusion or jump-diffusion processes? To address this question, we wrote a new stochastic dynamical equation, which includes a drift term and a combination of Poisson jump processes with different distributed sizes. In this article, we first introduce this equation in its simplest form including a drift term and a jump process, and show that such a jump-drift equation is able to describe the discrete time evolution of a diffusion process. Afterwards, we extend the modeling by considering more jump processes in the equation, which can be used to model complex systems with various distributed amplitudes. At each step, we also show that all the unknown functions and parameters required for modeling can be obtained non-parametrically from the measured time series.


The Langevin equation
The Langevin Equation is a widely used equation for modeling a continuous diffusion process.The Langevin dynamics produces a continuous sample path, and has the following expression using the Itô's calculus for stochastic integrals 17,18 : In this equation, x(t) is the state variable of the process, and {W(t), t ≥ 0} is a scalar Wiener process.Addi- tionally, D (1) (x, t) and D (2) (x, t) denote the first and second-order KM coefficients that are known as the drift and diffusion terms, respectively, and are obtained from Eq. ( 2) as follows: These coefficients are estimated directly from measured time series.

The jump-diffusion equation
Many stochastic processes are not classified as continuous processes [19][20][21][22] , and therefore the use of the Langevin equation is not justified for them.In general, non-vanishing M (4) (x, t) means that there are discontinuities in the trajectory of the time series, and jump events have a very important role in the underlying process.It is therefore necessary to improve Langevin equation to model discontinuous processes.One of the best generalizations of the classical Langevin equation that can create a discontinuous sample path is written as follows 8,10 : where again {W(t), t ≥ 0} is a scalar Wiener process, while D (1) j (x) and D (2) j (x) are the deterministic drift, and the diffusion coefficients (the j index denotes jumpy behavior, and is used to distinguish these KM coefficients from those defined in continuous processes), and J(t) is a Poisson jump process 23 .The jump has rate (x, t) and size ξ , which can have any symmetric distribution with finite even-order statistical moments, e.g.Gaussian distribution.It is shown that all the coefficients and parameters required in this modeling can be found directly from the measured time series by estimating the KM coefficients as follows 8,10 : Assuming that ξ is a random variable with a Gaussian distribution, i.e.ξ ∼ N(0, σ 2 ξ ) as well as using the rela- tion ξ 2l = 2l! 2 l l! ξ 2 l for the Gaussian random variables in the last relation in Eq. ( 6) for n = 4 and n = 6 , the amplitude of the jump σ 2 ξ (x, t) , and the rate of the jump (x, t) are estimated to be as follows: (1) K (2) (x, t) = M (2) (x, t)dt + O(dt) dx(t) = D (1) (x, t)dt + D (2) (x, t)dW(t). (4) By obtaining the jump parameters σ 2 ξ (x, t) and (x, t) and using them in the second relation of Eq. ( 6), the diffusion coefficient D (2) j (x, t) is determined.In addition, the first relation in Eq. (6) gives D (1) j (x, t) by estimat- ing M (1) (x, t) from the data.There are numerous studies regarding the use of the jump-diffusion Eq. ( 5), which describe the random evolution of neuron dynamics 19,20 , stochastic resonance 21 and climate data 22 .

Distinguishing between purely diffusive and jump-diffusion processes
In the sample path of many empirical time series, it is often observed that fluctuations are interrupted by sudden long-amplitude jumps between different states of a system 24 .The studies have shown that empirical detection of jumps is difficult because, in the real world, only discrete data from continuous-time models are available.In general, when data sampled at discrete intervals a sequence of discontinuous jump events will appear in the sampled path, even though the underlying path is continuous.The study on higher-order temporal approximations of KM conditional moments has shown that a finite sampling τ affects all the KM coefficients [25][26][27] .Such studies have found that even for diffusive processes, non-vanishing higher-order conditional moments (> 2) can originate from a discrete sampling.Therefore, the Pawula theorem cannot be used to judge whether the given time series falls under the classification of diffusion processes or jump-diffusion processes.This means that when analyzing empirical time series, one cannot be immediately ensure which dynamical Eq. ( 2) or ( 3) is appropriate for modeling the corresponding time series, unless one uses the diagnostic criteria presented in this context.For Langevin and Jump-Diffusion dynamics, there are criteria that can be used to check whether a given time series is inherently continuous or discontinuous.Here are two of the widely used criteria: 1-The first criterion for distinguishing between purely diffusive and jump-diffusion processes is the use of the K (4) (x,τ ) 3(K (2) (x,τ )) 2 ratio.This criterion was introduced by Lehnertz et al. 11 .Their results show that this ratio is close to 1 for diffusive processes and for small τ , but for jump-diffusion processes it diverges to 1/τ, namely: As it can be seen, using K (2) (x, τ ) and K (4) (x, τ ) will be problematic to detect jumps in the range of small time interval τ.In such a case, the next criterion can be used.
2-The second criterion to distinguish diffusive from jump-diffusion processes is based on the ratio of the fourth-and sixth-order KM conditional moments known as the Q-ratio, which was introduced in the same article 11

by Lehnertz et al. as follows:
Using expansion of the KM conditional moments in terms of τ , they found that when the process is purely dif- fusive Q(x, τ ) = D (2) (x)τ (Linearly dependent on τ ), while when the process has a jumpy behavior, Q(x, τ ) = σ 2 ξ (constant and independent of τ ), where D (2) (x) is the diffusion coefficient, and σ 2 ξ is the jump amplitude in jump-diffusion modeling.
In summary, by estimating the following Q-ratio from the data, one can be sure which dynamical Equation is appropriate to model the given time series: In the next sections after introducing our dynamical stochastic equation, we will also define a new criterion to differentiate diffusion processes from jump-diffusion processes.

Introducing the proposed method and results
As mentioned, when analyzing a time series sampled with time intervals τ , successive discontinuities in the sampled path are observed, despite the fact that the underlying path is continuous 11 .In addition, one of the main problems when using data sampled at discrete times is the distinction between discontinuities caused by continuous stochastic processes, and genuine discontinuities in the sample path of time series that were caused by finite sampling of continuous stochastic processes 11 .Some points like this raise the question: Is it possible to use only jump-drift processes to describe the random evolution of a time series sampled with finite time intervals τ ?To address this question, we introduce a new modeling that attributes any stochastic variation in the sample path of a given time series to a jump event, regardless of whether the underlying trajectory is continuous or not.Based on this, we build a new dynamical stochastic equation, and call it the jump-jump equation, which in its general form includes a deterministic drift term and several stochastic terms with jumpy behaviors as follows: (7) where D (1) (x) indicates the deterministic part of the process and J 1 (t), J 2 (t), etc are Poisson jump processes.The jumps have rates 1 , 2 , etc and sizes ξ 1 , ξ 2 , etc , which we assume they have zero mean Gaussian distribu- tions with variances (amplitudes)σ 2 ξ 1 , σ 2 ξ 2 , etc , respectively.We start with the simplest form of Eq. ( 10), which includes a drift term and only a jump process.It will be shown that such a jump-drift equation is able to model time series that are classified as continuous processes.Afterwards, we extend modeling by considering more jump processes in Eq. ( 10), and use it to model time series with more varied amplitudes.In each step, we will demonstrate that all unknown coefficients and functions involved in this model can be derived directly from the measured time series data.

Jump-drift modeling
We now consider Eq. ( 10) with a drift term and a jump process (a jump-drift equation), and show that it can be used to model time series belonging to the class of continuous processes when the data are sampled at discrete intervals.The general form of a jump-drift equation is as follows: where D (1) (x, t) denotes the drift part of the process, and J(t) is a Poisson jump process characterized by the rate (x, t) and the size ξ .We assume that ξ is a random variable, and has a zero mean Gaussian distribution, i.e. ξ ∼ N(0, σ 2 ξ ) .The variance of this distribution ( σ 2 ξ ) is called the jump amplitude, and in general may depend on x and t .We will show that all unknown parameters and functions required in this modeling can be estimated based on a data-driven approach from measured time series.Before doing so, it is necessary to mention two points: 1) We assume the case that J(t) is a homogeneous Poisson jump process with a constant jump rate .The jump rate represents the expected number of jumps that will occur per unit time.It follows that the number of jumps occurring in the interval of (t, t + dt] follows a Poisson distribution with the associated parameter dt .On the other hand, a jump event has two states of occurrence 1 and non-occurrence 0, of which only one will occur in each infinitesimal dt .The last point shows that in the Poisson process, the occurrence of an event in each small interval of time is defined as a Bernoulli variable.That is, dJ takes only the values 1 and 0 with probabilities dt and 1 − dt , respectively. 2) Up to the first orders in dt, the statistical moments of dJ are given by the following relation 8,10 : With these two points in mind, we now present a data-driven approach to estimate the drift and jump properties required in this modeling.This method can be used for both stationary and non-stationary time series, and the results are applicable to both.

Non-parametric estimation of jump-drift processes.
Theorem 1 For a jump-drift process described by the dynamical Eq. ( 11), all the functions and parameters required to model the process can be estimated non-parametrically by estimating KM coefficients from measured time series as follows: We have provided a proof for this theorem in the appendix.For non-stationary processes, all functions and parameters are time-dependent, but in the following, we focus on stationary processes, and omit the t-dependence in Eq. ( 12) to improve readability.
We can estimate the drift function D (1) (x) using the first relation in (12).The jump amplitude σ 2 ξ (x) and the jump rate (x) can be estimated using the relation ξ 2l = 2l! 2 l l! ξ 2 l for the Gaussian random variable ξ in the last relation in Eq. ( 12) with n = 2 and n = 4 .Therefore, we have: where We now argue that if Eq. ( 11) is able to describe the random evolution of a sampled time series x(t) belonging to the class of diffusion processes, then the following conditions should be held: (10)   www.nature.com/scientificreports/ 1.The last relation in (12) in terms of conditional moments K (n) (x) is written as follows: where with n = 2 and n = 4 it leads to: Extracting the ratio K (4) (x) 3(K (2) (x)) 2 from these relations leads to: On the other hand, we know from Eq. ( 8) that this ratio is approximately equal to 1 in diffusion processes for small dt , as a result: This criterion can be used as a possibility for numerical verification of Pawula theorem.Employing this measure, one can ensure that the given time series belongs to the class of diffusive processes or not: 2. Comparing K (2) (x) presented in ( 14) with the second-order conditional moment used in the Langevin modeling i.e.K (2) (x) = D (2) (x)dt , we obtain: Applying the condition (x)dt = 1 for diffusion processes leads to the following result: This means that if we use the drift-jump Eq. ( 11) to model diffusion processes, then the estimation of the jump amplitude σ 2 ξ (x) will lead to the estimation of the diffusion coefficient D (2) (x) required in Langevin modeling.
In order to test the validity of the proposed modeling, we reconstructed a diffusion process with preset drift and diffusion coefficients using a synthetic time series sampled with time intervals dt .Diffusive process gener- ated using the discretization of Eq. (3) in Euler-Maruyama scheme [28] with a sampling interval dt = 0.001 and with functions D (1) (x) = −x and D (2) (x) = 1 (the Ornstein-Uhlenbeck process).
Furthermore, to ensure that jump-drift modeling is capable to reconstruct a time series for x(t) that is statisti- cally similar to the original diffusion time series, we reconstructed a data set by applying the obtained parameters to the jump-drift equation Eq. (11).Afterwards, D (1) (x) and D (2) (x) were estimated from the reconstructed data, and we found a very good agreement between these estimated coefficients and the corresponding original ones (see Fig. 2).

Jump-jump modeling
In this section, we expand the jump-drift dynamical Eq. (11), and do not limit it to only a jump process.We begin by considering two jump processes with two different amplitudes in Eq. (10).Before continuing the discussion, let us explain how the idea of including these two jump processes comes about.
The jump-diffusion Eq. ( 5) that is able to construct a trajectory with jump discontinuities consists a deterministic drift term and two stochastic terms with diffusive and jumpy behaviors.On the other hand, when data sampled at discrete time intervals from a jump-diffusion process, two types of discontinuities are observed in the path of the sampled time series.Those discontinuities that originate from finite sampling of the diffusive part of the process, and have a smaller amplitude, and those discontinuities that arise from genuine jump events and have a larger amplitude.Based on this, we build a new equation including a deterministic drift term and two stochastic terms with jumpy behavior.The aim of this article is to introduce this jump-jump equation, which enables us to generate sample paths with successive discontinuities, but with two different distributed sizes.A jump-jump equation is as follows: where D (1) (x, t) indicates the deterministic part of the process and J 1 (t) and J 2 (t) are Poisson jump processes.The jumps have rates 1 and 2 , and sizes ξ 1 and ξ 2 , which we assume have zero mean Gaussian distributions with variances σ 2 ξ 1 and σ 2 ξ 2 , respectively (or any symmetric distribution with finite statistical moments).In general, the jump rates 1 and 2 and statistical moments of σ 2 ξ 1 and σ 2 ξ 2 may be functions of state variable x and time t .We also assume that any discontinuity in the sample path is caused by the occurrence of only one of the jump events dJ 1 (t) or dJ 2 (t) , and two jumps do not occur simultaneously.The meaning of this condition is that in the time interval (t, t + dt] , if for example dJ 1 (t) occur, and takes the value 1, then dJ 2 (t) does not occur, and its value becomes zero and vice versa.Applying this condition enables us to construct a time series via Eq.( 17) whose corresponding trajectory consists of successive jump discontinuities with different amplitudes and jump rates.In other words, by applying this condition, Eq. ( 17) is able to describe the random evolution of a jump-jump process, a process whose corresponding time series consists of the union of two data sets belonging to two jump processes with different amplitudes and rates.We now discuss a nonparametric approach to estimating drift and jump characteristics directly from the measured time series data.This method can be applied to both stationary and non-stationary time series, and the results can be applied to both.

Non-parametric estimation of jump-jump processes
Theorem 2 For a jump-jump process described by the dynamical Eq. ( 17), all the functions and parameters required to model the process can be estimated non-parametrically by estimating KM coefficients from measured time series as follows: In the Appendix, we have presented a proof for this theorem.In this section, as before, we focus on stationary processes, and we remove the t-dependencies in Eq. ( 18). ( 18)  15) and ( 16).
The five unknown parameters required for this modeling are D (1) (x), 1 (x), 2 (x), σ 2 ξ 1 (x) and σ 2 ξ 2 (x) .The first relation in this theorem gives us the estimate for the drift coefficient, which is equal to the first-order KM coefficient, namely: Additionally, from the last relation in Eq. ( 18) for n = 2, 4, 6, 8 , we can derive a system of equations to estimate the parameters of jump processes as follows (we use the relation ξ 2l = (2l)! 2 l l! ξ 2 l for the Gaussian random variables ξ 1 and ξ 2 ): By solving this system of nonlinear equations, the unknowns 1 (x), 2 (x), σ 2 ξ 1 (x), σ 2 ξ 2 (x) are estimated using M (2) (x), M (4) (x) and M (6) (x) , which are obtained from the data.Since the parametric solution of this system of equations leads to long and boring relations, we refrain from presenting them, and use the numerical methods.
To demonstrate the validity of our approach, we estimated drift and jumps characteristics from synthetic time series generated with preset coefficients.First, we considered Eq. ( 17) with D (1) (x) = −x as a linear drift func- tion and two constant jump amplitudes σ 2 ξ 1 (x) = 0.2 and σ 2 ξ 2 (x) = 0.5 with constant jump rates per data point As can be seen, the good agreement between the estimated coefficients, and the original coefficients confirms that the jump-drift equation is able to describe the discrete time evolution of a diffusion process.
1 (x) = 0.6 and 2 (x) = 0.4 , respectively.It is worth noting that the jump rate per data point is different from the jump rate per unit of time in a dt , i.e. (perdatapoint) = perunitoftime * dt .We generated synthetic time series by discretizing Eq. ( 17) using Euler-Maruyama discretization scheme with dt = 0.01 .Afterwards, we estimated the drift function and jump characteristics from the synthetic time series using relations present in ( 19) and (20).Very good agreement was observed between all estimates and initial functions and parameters (see Fig. 3).As a second example, we considered Eq. ( 17) with a linear drift function D (1) (x) = −10x and two jump amplitude as σ 2 ξ 1 (x) = bx 2 ( b = 0.001) and σ 2 ξ 2 (x) = 1 , with constant jump rates per data point 1 (x) = 0.7 and 2 (x) = 0.3 , respectively.We proceeded as before, and generated an exemplary synthetic time series using the discretization of Eq. ( 17) in Euler-Maruyama scheme with a sampling interval dt = 0.001 .Again, a very good agreement was found between the estimated and predetermined functions and parameters (see Fig. 4).17) with a time interval t = 0.01 , a drift function D (1) (x) = −x and two constant jump amplitudes σ 2 ξ 1 (x) = 0.2 and σ 2 ξ 2 (x) = 0.5 with jump rates 1 (x) = 0.6 and 2 (x) = 0.4 , respectively. (b) Estimated drift term and (c-f) estimated jumps characteristics using relations in Eq. ( 20).The red lines are the corresponding theoretical coefficients.

Jump-jump modeling with constant coefficients and parameters
Because of its practical uses, in this section we focus on a special case of Eq. ( 17), where all coefficients and parameters are assumed constant and none of them are time-dependent or state-dependent.For this purpose, we rewrite the Eq. ( 17) as follows:  17) with a time interval t = 0.001 , a linear drift D (1) (x) = −10x and two jump amplitudes σ 2 ξ 1 (x) = 0.001x 2 and σ 2 ξ 2 (x) = 1 with jump rates 1 (x) = 0.7 and 2 (x) = 0.3 , respectively. (b) Estimated drift coefficient.(c-f) Estimated jump characteristics using relations in (20).The red lines are the corresponding theoretical coefficients.
where μ is the drift parameter and other parameters are the same as previously defined.Similar to the proof provided in Theorem 2, one can prove that all necessary parameters and coefficients in this modeling are obtained non-parametrically by estimating the statistical moments of the increments of the measured time series as follows: where M n = lim dt→0 1 dt �dx n � are the statistical moments of the increments of the time series, namely dx = x(t + dt) − x(t).As before, we derive the following relations from Eq. ( 22): By solving this system of equations, the 5 unknown parameters, i.e. µ and 1 , 2 , σ 2 ξ 1 , σ 2 ξ 2 can be obtained.Again, to investigate the validity of this approach, we estimated these parameters from synthetic time series generated with known drift and jump parameters.We considered Eq. ( 21) with µ = 1 and two constant jump amplitudes σ 2 ξ 1 = 1 and σ 2 ξ 2 = 0.3 with two constant jump rates per data point 1 = 0.4 and 2 = 0.6 , respectively.We generated synthetic time series x(t) using the Euler-Maruyama scheme with a sampling interval dt = 0.001 .A sample path of x(t) is shown in Fig. 5.In addition, we constructed a new time series y(t) based on the incre- ments of x(t) , i.e.y(t) = x(t + �t) − x(t) (the trajectory of y(t) is also shown in Fig. 5).
By calculating the statistical moments of y(t) for n = 1, 2, 4, 6, 8 and substituting in Eqs.(23), and then solv- ing this system of equations, the following results were estimated, which are in very good agreement with the original values:

Expansion of the jump-jump equation
The strength of jump-jump modeling is that if the amplitude of fluctuations in a given time series is so diverse that its random evolution cannot be described using only two jump processes such as seen in Eq. ( 17) or (21).Afterwards, the stochastic part of the Eq. ( 10) can be expanded by considering more jump processes.For example, Eq. ( 21) is expanded as follows considering three jump processes: As before, we assume that any random variation in the time series data is due to the occurrence of only one of the jump events, and that two or more jump events do not occur simultaneously.That is, when in a time step ( 22) (24) dx(t) = µdt + ξ 1 dJ 1 (t) + ξ 2 dJ 2 (t) + ξ 3 dJ 3 (t) dJ 1 (t) occur and takes the value 1,dJ 2 (t) and dJ 3 (t) do not occur and their values are zero, and so on.The follow- ing section discusses a nonparametric approach to estimate the drift parameter µ and the jump characteristic 1 , 2 , 3 , σ 2 ξ 1 , σ 2 ξ 2 , σ 2 ξ 3 required in this modeling.

Theorem 3: parametric estimation of jump-jump processes
Theorem 3 For a jump-jump process described by the dynamical Eq. ( 24), all the functions and parameters required to model the process can be estimated non-parametrically by estimating KM coefficients from measured time series as follows: where all parameters and coefficients are the same as previously defined.In the Appendix, we have presented a proof for this theorem.As before, the first relation in (25) use for estimating the drift coefficient, which is equal to the first-order KM coefficient: On the other hand, using the last relation in Eq. ( 25), with n = 2, 4, 6, 8, 10, 12 , and using the relation ξ 2l = 2l! 2 l l! ξ 2 l for the Gaussian random variables ξ 1, ξ 2 and ξ 3 , one can estimate the 6 unknown parameters 1 , 2 , 3 , σ 2 ξ 1 , σ ξ 2 , σ 2 ξ 3 by solving the following system of equations: To demonstrate the validity of this modeling we constructed a synthetic time series x(t) with a constant drift parameter µ = 5 and jump amplitudes σ 2 ξ 1 = 0.2 and σ 2 ξ 2 = 0.6 and σ 2 ξ 3 = 10 with constant jump rates per data point 1 = 0.3 and 2 = 0.2 and 3 = 0.5 , respectively.We generated the synthetic time series x(t) using the discretization of Eq. ( 24) with a sampling interval dt = 0.001 in Euler-Maruyama scheme.A random path of x(t) , and corresponding increments y(t) = x(t + �t) − x(t) are shown in Fig. 6.
Afterwards, by calculating the statistical moments of y(t) for n = 1, 2, 4, 6, 8, 10, 12 and substituting in Eqs.(26) and (27), we estimated the drift parameter and jumps characteristics.The obtained results confirm the effectiveness of the presented modeling:

Conclusion
We discussed that when one deals with data sampled at discrete times, one encounters successive discontinuities along the path of the sampled time series.The observation of such sequential discontinuities, in the sample path of empirical time series, gave us the idea to develop a new modeling in which any random variation in the path is attributed to a jump event, even if the sampled time series belongs to the class of diffusive processes.Based on this, we introduced a new dynamical stochastic equation -a jump-jump equation-including a deterministic drift term and a combination of several Poisson jump processes with different distributed sizes.The general form of this equation is as follows: In this modeling we also assumed that the jump events do not occur simultaneously so that the jumps have no overlap.We started with the simplest form of equation including a deterministic drift term and a jump process as the stochastic component, and argued that it can be used to describe the discrete time evolution of a Langevin process.We provided a measure to distinguish the type of underlying process -diffusive or jumpy-from the corresponding time series as well.Afterwards, we increased the variety of modeling by considering more jump processes with different distributed sizes.We also demonstrated that all unknown functions and parameters required for each of the modeling are estimated non-parametrically from the measured data set.It should be noted that depending on the number of data points and variety of the amplitude of fluctuations, the jump-jump ( 25)

Figure 2 .
Figure 2. (a) Reconstructed data by jump-drift Eq. (11) using estimated parameters from Fig. 1 with time interval ∆t = 0.001.(b) Estimated drift coefficient, and (c) estimated diffusion coefficient, obtained from reconstructed data.The red lines are the initial coefficients.As can be seen, the good agreement between the estimated coefficients, and the original coefficients confirms that the jump-drift equation is able to describe the discrete time evolution of a diffusion process.